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Abstract — The Trafficaccidentfrequencyhasbeenincreasing 
inEgyptinthe recentyears for many reasons (human, place, and 
time). This paper aims to find the best model for the annual 
traffic accidents statistics in Egypt from 2005 to 2015 and make 
a prediction of the number of annual traffic accidents likely to 
occur in future. The analysis of time series data of traffic 
accidents is presented using the classical statistical methods and 
Box -Jenkins methodology to build ARIMA model. 

Index Terms — Time series analysis, classical statistical 
methods, forecasting, ARIMA, traffic accident. 

I. INTRODUCTION 

Victims due to traffic accidents are more than 5000 of death 
and 22000 injures with different hurts, annually. Economical 
loses are 2 % from national total income according to data of 
Egyptian society for protection from traffic accidents. Traffic 
accidents are considered the second reason for death in Egypt 
and 80 % of victims are between 15 and 45 years old (Ali 
(2009)). For that, this paper analyzes and predicts the future 
traffic accidents using statistics methods. 

Time series models have been the basis for process behavior 
studies or metrics over a period of time. There are many 
application areas of time series models such as sales 
forecasting, weather forecasting, and inventory studies. In 
decisions that involve factor of uncertainty of the future, time 
series models have been found one of the most effective 
methods of forecasting (Makridakis et al, 1998) . 

Time series data often have time-dependent moments (e.g. 
mean, variance, skewness, kurtosis).The mean or variance of 
many time series increases over time.This is a property is 
called nonstationarity. Astationary time series have mean, 
variance, and autocorrelation function that are essentially 
constant through time. 

Among the most important models of time series analysis is 
the model of ARIMA which has been introduced by Box and 
Jenkins.The Box and Jenkins model assumes that the time 
series is stationary. For nonstationary time series, Box and 
Jenkins recommend differencing of one or more time series 
to achieve stationarity. This produces an ARIMA model (auto 
regressive (AR), Integrated (I) and the moving average (MA) 
). 

Ali (2009) decided three main factors (human, place and 
time) which have the most effect on the traffic accidents in 
Egypt. Momani (2009) presented the time series analysis 
rainfall data in Jordanand studied the Box- Jenkins 
methodology to build ARIMA model for monthly rainfall data 
taken for Amman airport station for the period from 
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1922-1999 with a total of 936 readings. ARIMA (1, 0, 0) (0, 
1,1) 12 model was developed. 

Tularam and Mahbub (2010) examined a large data set 
involving more than 50 years of rainfall and temperature 
where spectral analysis and time series analysis -ARIMA 
methodology were used to analyze climatic trends and 
interactions. 

Balogun etal (2014) analyzed a data set collected from 
Nigerian traffic accidents using time series approach. The 
data collected spanned the period between 1989 to 2008. 
They found that the best model was AR (1) for annual data. 
Mutangi (2015) analyzed the data of traffic accidents in 
Zimbabwe by three ARIMA models which were suggested 
based on the ACF and PACF plots of the differenced series. 
These were ARIMA(0,1,0), ARIMA( 1,1,0) and 
ARIMA( 1,1,1) and he decided that ARIMA (0,1,0) was the 
best model for the Zimbabwe annual Traffic Accidents data. 

II. The Time series analysis 

A time series is a sequential set of data points, measured 
typically over successive times. It is mathematically defined 
as a set of vectors y t , t=0,l,2, . . . where, the subscript t is the 
time point at which y is observed (Pankratz (1983)). 
According to Chatfield (1987) time series is a collection of 
observation segmental in time at regular intervals. There are 
four factors affecting time series observations: the trend 
effect, the seasonal effect, cyclical effect and random 
variation. The majority of the time series contains a trend 
effect either increasing or decreasing, therefore it's the most 
important effect that must be studied when analyzing the time 
series. This analysis can be done by several methods such as 
the classic approaches, least square method (OLS) , matrices, 
semi average, quadratic trend model and moving average 
method (MA) . 

The Box- Jenkins methodology which is known by ARIMA 
models will be introduced, ARIMA model as a non-stationary 
time series model is made stationary by applying finite 
differencing of the data points. The mathematical formulation 
of the ARIMA has (p,d,q) form where p , d and q are integers 
greater than or equal to zero and refer to the order of the 
autoregressive ( p ), the order of difference (d), and the order of 
moving average ( q ) parts of the model .The integer d controls 
the level of differencing where d equal 1 is generally enough 
in most cases. When d is equal to 0, then it reduces to an 
ARMA(/?,g) models. The linear regression is estimated as it 
follows. 

y t = Po + P\ t+£ t ( 2.i) 

„„ >’, , B n , B , , and s, 

Where T ° L 1 are the current observation, 

constant of regression line, the regression coefficient and the 
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random errors which satisfy independent identical 
distribution (i.i.d) (normal distribution with mean equal zero 
and constant variance) respectively. Then the estimate of 
equation (2.1) can be written as shown in equation (2.2). 


y-Po + Pit 


( 2 . 2 ) 




A=- 


=1 


i = 1 i = 1 


n n 


n Tj f 2 ~Ql ^ 2 

i = 1 i = 1 

Po = y> -Pi*, 


(4.1) 


III. The data set 

The time series data set (shown in Table 1 and figure 1) 
presents the number of the traffic accidents according to the 
Central Agency for Public Mobilization and Statistics 
(CAPMAS) in Egypt. 

Table 1: The Traffic Accidents from 2005 to 2015 


Year 

Accident 

Numbers 

2005 

21352 

2006 

18061 

2007 

22900 

2008 

20938 

2009 

22793 

2010 

24371 

2011 

16830 

2012 

15516 

2013 

15578 

2014 

14403 

2015 

14548 



IV. The classical methods 

This section presnts the classical method to analyze the data 
set (in table 1) and presents the method accuracy by using 
some accuracy measures such as the mean absolute deviation 
MAD,MAPE and mean square error MSE.The classical time 
series analysis method decomposes the time series function y t 
= f(t) into up to four componentsMcClave and Synch(2001). 

1. Trend: a long-term monotonic change of the average level 
of the time series. 

2. The Trade Cycle: a long wave in the time series. 

3. The Seasonal Component: fluctuations in time series that 
recur during specific time periods. 

4. The Residual component: the influences on the time series 
that are not explained by the other three components. 

4.1 the least square method OLS 

The equation (2.2) is solved using the OLS method as it 
follows. 


2 >, Z> <4 - 2) 

where y t = — — and t = ± — 
n n 

By solving (4.1) and (4.2), the regression line can be written 
as it follows. 

y t - 23613 . 1 - 794 . 77 ^ (4.3) 

The predictive values are shown in Table (2), the graph of 
predictive and observed data is shown in Figure (2), and the 
statistical analysis is presented in tables 3,4 and 5. 


Table 2: The Predictive Values using OLS 


Year(t) 

yt 

y, 

1(2005) 

21352 

22818.66 

2(2006) 

18061 

22024.22 

3(2007) 

22900 

21229.78 

4(2008) 

20938 

20435.34 

5(2009) 

22793 

19640.9 

6(2010) 

24371 

18846.46 

7(2011) 

16830 

18052.02 

8(2012) 

15516 

17257.58 

9(2013) 

15578 

16463.14 

10(2014) 

14403 

15668.7 

11(2015) 

14548 

14874.26 

12(2016) 


14075.86 



Figure 2: The observation and predictive values using 
OLS method 


Table 3: Model Summary 


Model 

R 

R Square 

Adjusted R 
Square 

Std. Error of 
the Estimate 

1 

.710 a 

.504 

.449 

2756.30749 


a. Predictors: (Constant), year 

Table 4: ANOVA a 


Model 

Sum of Squares 

Df 

Mean Square 

F 

Sig. 

Regression 

69483005.682 

1 

69483005.682 

9.146 

,014 b 

1 Residual 

68375079.045 

9 

7597231.005 



Total 

137858084.727 

10 





a. Dependent Variable: number of accident 
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b. Predictors: (Constant), year 

Table 5: Coefficients 3 


Model 

Unstandardized 

Coefficients 

Standardized 

Coefficients 

t 

Sig. 

B 

Std. Error 

Beta 

(Constant) 

year 

23613.182 

-794.773 

1782.421 

262.804 

-.710 

13.248 

-3.024 

.000 

.014 


a. Dependent Variable: number of accidents 


MAD = 


n — 2 


n 


=2413.34 


(4.4) 


MSE = Ye? =7597236 

«-2h 


MAPE = 


(4.5) 

1 


n 

z 


n- 2 i=1 

(4.6) 


=11 


Where e t = y t - V, 


4.2 The Matrices Method 

To solve the equation (2.2) using matrices, we can rewrite 
(2.2) as it follows. 


Y, =X,/) I 


(4.7) 


where 



1 

n 

t = 1 

1 

?M a 

Of 

1 

1 

Vi 

i 

n n 

n i f2 -& 0 2 

t= i t= i 

n 

7=1 

n 

1 

£ 

'Wi 

i 



23613.2 

-794.76 


(4.8) 


Table 6: The Predictive Values using MA method 



Moving 

summation 

(MS) 

MS 

>7 

y ' 3 (MA period) 

21352 



18061 

62313 

20771 

22900 

61899 

20633 

20938 

66631 

22210.33 

22793 

68102 

22700.67 

24371 

63994 

21331.67 

16830 

56717 

18905.67 

15516 

47924 

15974.67 

15578 

45497 

15165.67 

14403 

44529 

14843 

14548 





Figure 3: The observation and predictive values using 
MA method 


MAD = 

n- 2 



=1418.62 


(4.10) 


MSE = — ^ ef 

n- 2 , =1 

(4.11) 


MAPE = 


n-2% ^ 


= 16 (4.12) 


=3136740.44 


Then 

y t =23613.1-794.77 t (4.9) 

This method gives approximately the same resultsas that of 
OLS method. 

4.3 The Moving Average (MA) Method 

Moving average(MA) method is one of widely known 
technical indicators used to predict the future data in time 
series analysis. The moving average is extremely useful 
for forecasting long-term trendswhere the average represents 
the “middling” value of a set of numbers. First of all, the 
period of the moving average has to be decided and for this 
data set the period of 3 values is estimated for the 
observations shown in Tables5 and 6 and Figure 3. 


4.4 The Semi Average Method 

To solve the equation (2.2) using the semi average method, 
time series data is divided into two equations to be solvedto 
find the parameters of (2.2). Because the data have odd 
observations, then we must delete the observation number 
six as it follows. 

yi=P 0 -Pih (4-i3) 

yj=2120&8, t l = 3 

W = P a ~ Pit i (4-14) 

y t =15375 6=9 

Then 

y, =241257-9723* (4.15) 
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The predictive values and real data are illustrated in table (7) 
and Figure (4) 


Table 7: The Predictive Values using semi average 


year 

yt 

y, 

1 

21352 

23153.4 

2 

18061 

22181.1 

3 

22900 

21208.8 

4 

20938 

20236.5 

5 

22793 

19264.2 

6 

24371 

18291.9 

7 

16830 

17319.6 

8 

15516 

16347.3 

9 

15578 

15375 

10 

14403 

14402.7 

11 

14548 

13430.4 



1 2 3 4 5 6 7 8 9 10 11 V ear 


Figure (4): The observation and predictive values using 
semi average method 


MAD = 

n - 2 



= 67544 


(4.16) 


MSE = TV =835571322 

n- 2 , = i 


(4.17) 


MAPE = y 

n- 2 i=1 


= 24.4 


(4.18) 

4.5 The quadratic trend model 


Trend Analysis Plot for acc.no 

Quadratic Trend Model 
Yt = 20129 + 774xt - 126.8xt A 2 



Variable 
— « — Actual 
■ Fits 


Accuracy Measures 
MAPE 10 

MAD 1923 

MSD 4967359 


Figure (5): The observation and predictive values using 
quadratic trend model 


MAD = 


1 


ft- 2 


n 

2 >, 


i=i 


= 1923 


(4.22) 


MAPE = V 

n ~ 2 , =l 


= 10 


(4.23) 


V. The Box and Jenkins methodology 

Box - Jenkins analysis refers to a systematic method of 
identifying, fitting, checking, and forecasting. Identification 
determines the appropriate values of p, d, & q using the ACF, 
PACF, and unit root tests (p is the AR order, d is the 
integration order, q is the MA order). Estimation estimates an 
ARIMA model using values of p, d, & q. Diagnostic checking 
checks residuals of estimated ARIMA model(s) to check if 
they are white noise.Forecasting produces sample forecasts or 
set aside last few data points for in-sample forecasting. The 
Box-Jenkins model assumes that the time series is stationary 
and it recommends differencing non- stationary series one or 
more times to achieve stationarity (Box et al (1994)). 

Using integrated autoregressive, moving average (ARIMA) 
time series model is appropriate for time series of medium to 
long length. 

y,=t*+<hy,-i +■■■</> P y<-p --Ay*-* ~ £ < 

(/> parameterof AR, 6 q parameterof MA,and 
the rondom error s t ~ (0,cr 2 ) 


In some cases, a linear trend is inadequate to capture the trend 
of a time series. A natural generalization of the linear trend 
model is the polynomial trend model as in equation (4.19). 
y t =p 0 +Pit+p 2 t 2 + ... +£/ (4.19) 

where p is a positive integer. 

The quadratic linear trend model is a special case of the 
polynomial trend model (p=l), (for economic time series we 
almost never require p > 2). Then the equation (4.19) can be 
written as it follows. 

y,=Po+Pit+PA 4.20) 

By using Minitap 17, the equationis estimated as it follows. 
y t = 20129+774 t+126.8t 2 

(4.21) 

The equation (4.21), the predictive and actual values are 
illustrated in Figure (5). 


In this section the four steps (identification, estimation, 
checking and forecasting) 
are discussed. 

5.1 Autocorrelation Function (ACF) 

Autocorrelation Function (ACF) computes the correlation 

between different lags of a series. The ACF (p k ) represents 

the degree of persistence over respective lags of a variable. 
The autocorrelation function is presented in Figure (6) 

n 

^ = V = —r, 

r ° Z<y,- yf 

i=l 

(5.2) 
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ACF (0) = 1, ACF (k) = ACF (-k) 



Figure (6): The Autocorrelation Function 

Figure (6) proved that the ACF decreases quickly after one 
lag which means thatthe model AR becomes stationary after 
one difference. 

5.2The Partial Autocorrelation Function (PACF) 

The Partial Autocorrelation Function (PACF) expresses 
information useful in determining the order of an ARIMA 
model (Box et al(1994)). PACF coefficient is the 
correlation between y t and y t _ k after omitting the effect of 
yt-i,..-,y t -k-L The lag k partial autocorrelation is the partial 
regression coefficient, in the k th order autoregression, 
yt - Okiyt-i + 0k2y t -2 + • • •+ @kkyt-k+ £ t 

(5.3) 

PACF is represented in Figure (7). 



Figure (7): The Partial Autocorrelation Function 

Figure (7) shows that the PACF decreases quickly after one 
lag. This means that the model MA is of order one. From ACF 
and PACF, we can judge that the number of traffic accidents 
appropriate model is ARIMA( 1,1,1) and this concludes 

identification step. 

y, =/4 + fay t - l -0iy t -i-e t 

(5.4) 

Then the parameters will be estimated and the estimation 
using spss 21 is presented in Table 8 and 9. The residual 
statistic is also presented in Table 9. 


RMSE 

3491.332 

MAPE 

12.111 

MSE 

10282406 

MAE 

2317.429 

MaxAE 

4044.052 

Normalized BIC 

17.237 


Thediagnostic step depends on ACF and PACF of residuals of 
the data which are illustrated in Figure 9 and 10. 

ACF of Residuals for acc.no 
(with 5% significance limits for the autocorrelations) 

1.0 

0.8 

0.6 

j - 0.4 
o 

°- 2 

b 0.0 . 

o 1 

O - 0.2 

< - 0.4 
-0.6- 
- 0.8 . 

- 1.0 

1 2 

Lag 

Figure ^autocorrelation for the residuals of model 
ARIMA(1,1,1) 

PACF of Residuals for acc.no 
(with 5% significance limits for the partial autocorrelations) 

1.0 
0.8 
E 06 

0 

'■*= 04 

1 0.2. 
o 

o 0.0 , 

< -0.2 
- 0.4 
- 0 . 6 . 

- 0.8 

- 1.0 

Lag 

Figure9: partial autocorrelation for the residuals of 
model ARIMA(1,1,1) 


TablelO: Model statistics 


Model 

Model Fit statistics 

Ljung-Box 

Q(18) 

acc.no-Model_l 

Stationary 

MAE 

DF 

1 

R-squared 




0.275 

2340.675 

0 


Table 8 : Parameter estimation 


Type 

Coef 

SE Coef 

T 

P 

AR1 

0.2315 

0.4662 

0.50 

0.635 

MAI 

0.9088 

03957 

2.30 

0.055 

constant 

-576.7 

166.6 

-3.46 

0.011 


Figure (8), (9) and Table (10) illustrate the residuals of the 
model following random pattern. There is no correlation 
between the random errors (from Ljung-Box) which means 
that the model represents the data. This concludes the 
Forecastingfinal step where the current data will be 
introduced in Table 1 1 and Figure 10. 


y t =-576.7 + 0.23 15y M -0.9088^^(5.5) 


Table 9:The Model Fit 


Fit Statistic 

Mean 



Stationary R-squared 

0.265 

R-squared 

0.408 


Table ll:The Predictive Values using ARIMA(1,1,1) ; 
95% limits. 


Period 

Forecast 

Lower 

Upper 

Actual 

2008 

20165.1 

13878.9 

26451.4 

20938.0 

2009 

18955.3 

12350.0 

25560.6 

22793.0 

2010 

18098.5 

11411.4 

24785.6 

24371.0 
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2011 

17323.4 

10586. 

24059.9 

16830.0 

2012 

16567.3 

9787.8 

23346.7 

15516.0 

2013 

15815.5 

8994.8 

22636.2 

15578.0 

2014 

15064.7 

8203.3 

21926.2 

14403.0 

2015 

14314.2 

7412.3 

21216.1 

15578.0 

2016 

13563.7 

6621.7 

20505.8 


2017 

12813.3 

5831.3 

19795.3 


2018 

12062.8 

5041.1 

19084.5 
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Time Series Plot for acc.no 
(with forecasts and their 95% confidence limits) 



Figure (10): The observation and predictive values using 
ARIMA(1,1,1) 


VI. The Conclusions 

Ranging global rate of road death toll per 10 thousand 
vehicles, between 10 and 12 dead, but he arrives in Egypt to 
25 people, twice the world average, and also has a death toll 
of road accidents per 100 km in Egypt, 131 people were 
killed, while the global average ranges between 4 and 20 
people, which means that the rate in Egypt more than 30 times 
the global average, and also the cruelty of the incident tells us 
that Egypt is happening with 22 people per 100 wounded, 
while the global average 3 deaths per 100 injured. 

Therefore this paper has presented the results of the statistical 
analysis of traffic accidents data in Egypt and also many 
models. The traffic accidents data was statistically analyzed 
by classical method and Box and Jenkins method. Among the 
classical methods quadratic trend linear method was the best 
because of its least values of accuracy measures (MAPE, 
MAD, and MSE). Box and Jenkins was the best in 
representing the time series data. Because the classical 
method treats the regression relationship as deterministic, it is 
very sensitive to any data update. Time series have many 
stochastic trends and the Box and Jenkins model can be 
modified to accommodate any data update it can be checked. 
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